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Abstract 

Evaluation  of  counterfactual  queries  (e.g.,  “If 
A  were  true,  would  C  have  been  true?”)  is 
important  to  fault  diagnosis,  planning,  de¬ 
termination  of  liability,  and  policy  analysis. 

We  present  a  method  for  evaluating  counter¬ 
factuals  when  the  underlying  causal  model  is 
represented  by  structural  models  -  a  nonlin¬ 
ear  generalization  of  the  simultaneous  equa¬ 
tions  models  commonly  used  in  econometrics 
and  social  sciences.  This  new  method  pro¬ 
vides  a  coherent  means  for  evaluating  poli¬ 
cies  involving  the  control  of  variables  which, 
prior  to  enacting  the  policy  were  influenced 
by  other  variables  in  the  system. 

1  INTRODUCTION 

A  counterfactual  sentence  has  the  form 

If  A  were  true ,  then  C  would  have  been  true 

where  A ,  the  counterfactual  antecedent,  specifies  an 
event  that  is  contrary  to  one’s  real-world  observations, 
and  C ,  the  counterfactual  consequent,  specifies  a  result 
that  is  expected  to  hold  in  the  alternative  world  where 
the  antecedent  is  true.  A  typical  example  is  “If  Oswald 
were  not  to  have  shot  Kennedy,  then  Kennedy  would 
still  be  alive,”  which  presumes  the  factual  knowledge 
that  Oswald  did  shoot  Kennedy,  contrary  to  the  an¬ 
tecedent  of  the  sentence. 

Counterfactual  reasoning  is  at  the  heart  of  every  plan¬ 
ning  activity,  especially  real-time  planning.  When  a 
planner  discovers  that  the  current  state  of  affairs  de¬ 
viates  from  the  one  expected,  a  “plan  repair”  activ¬ 
ity  need  be  invoked  to  determine  what  went  wrong 
and  how  it  could  be  rectified.  This  activity  amounts 
to  an  exercise  of  counterfactual  thinking,  as  it  calls 
for  rolling  back  the  natural  course  of  events  and  de¬ 
termining,  based  on  the  factual  observations  at  hand, 
whether  the  culprit  lies  in  previous  decisions  or  in  some 
unexpected,  external  eventualities.  Moreover,  in  rea¬ 
soning  forward  to  determine  if  things  would  have  been 


different  a  new  model  of  the  world  must  be  consulted, 
one  that  embodies  hypothetical  changes  in  decisions 
or  eventualities,  hence,  a  breakdown  of  the  old  model 
or  theory. 

The  logic-based  planning  tools  used  in  AI,  such  as 
STRIPS  and  its  variants  or  those  based  on  the  situa¬ 
tion  calculus,  do  not  readily  lend  themselves  to  coun¬ 
terfactual  analysis;  as  they  are  not  geared  for  coherent 
integration  of  abduction  with  prediction,  and  they  do 
not  readily  handle  theory  changes.  Remarkably,  the 
formal  system  developed  in  economics  and  social  sci¬ 
ences  under  the  rubric  “structural  equations  models” 
does  offer  such  capabilities  but,  as  will  be  discussed 
below,  these  capabilities  are  not  well  recognized  by 
current  practitioners  of  structural  models.1  The  pur¬ 
pose  of  this  paper  is  both  to  illustrate  to  AI  researchers 
the  basic  formal  features  needed  for  counterfactual  and 
policy  analysis,  and  to  call  the  attention  of  economists 
and  social  scientists  to  capabilities  that  are  dormant 
within  structural  equations  models. 

Counterfactual  thinking  dominates  reasoning  in  polit¬ 
ical  science  and  economics.  We  say,  for  example,  “If 
Germany  were  not  punished  so  severely  at  the  end  of 
World  War  I,  Hitler  would  not  have  come  to  power,” 
or  “If  Reagan  did  not  lower  taxes,  our  deficit  would  be 
lower  today.”  Such  thought  experiments  emphasize  an 
understanding  of  generic  laws  in  the  domain  and  are 
aimed  toward  shaping  future  policy  making,  for  ex¬ 
ample,  “defeated  countries  should  not  be  humiliated,” 
or  “lowering  taxes  (contrary  to  Reaganomics)  tends  to 
increase  national  debt.” 

Strangely,  there  is  very  little  formal  work  on  coun¬ 
terfactual  reasoning  or  policy  analysis  in  the  behav¬ 
ioral  science  literature.  An  examination  of  a  number 
of  econometric  journals  and  textbooks,  for  example, 
reveals  an  imbalance:  while  an  enormous  mathemat- 


1  These  were  clearly  recognized  though  by  the  found¬ 
ing  fathers  of  structural  models,  as  can  be  seen  in  the 
publications  of  the  Cowels  Commission  [Haavelmo,  1943] 
[Marschak,  1950]  [Simon,  1953]  but,  with  the  exception 
of  [Strotz  and  Wold,  1971],  [Simon  and  Rescher,  1966]  and 
[Fisher,  1970],  have  all  but  disappeared  from  the  economet¬ 
rics  literature. 


ical  machinery  is  brought  to  bear  on  problems  of  es¬ 
timation  and  prediction,  policy  analysis  (which  is  the 
ultimate  goal  of  economic  theories)  receives  almost  no 
formal  treatment.  Currently,  the  most  popular  meth¬ 
ods  driving  economic  policy  making  are  based  on  so- 
called  reduced-form  analysis:  to  find  the  impact  of  a 
policy  involving  decision  variables  X  on  outcome  vari¬ 
ables  Y,  one  examines  past  data  and  estimates  the 
conditional  expectation  E(Y \X  —  x)}  where  x  is  the 
particular  instantiation  of  X  under  the  policy  studied. 

The  assumption  underlying  this  method  is  that  the 
data  were  generated  under  circumstances  in  which  the 
decision  variables  X  act  as  exogenous  variables,  that 
is,  variables  whose  values  are  determined  outside  the 
system  under  analysis.  However,  while  new  decisions 
should  indeed  be  considered  exogenous  for  the  pur¬ 
pose  of  evaluation,  past  decisions  are  rarely  enacted  in 
an  exogenous  manner.2  Almost  every  realistic  policy 
(e.g.,  taxation)  imposes  control  over  some  endogenous 
variables,  that  is,  variables  whose  values  are  deter¬ 
mined  by  other  variables  in  the  analysis.  Let  us  take 
taxation  policies  as  an  example.  Economic  data  are 
generated  in  a  world  in  which  the  government  is  react¬ 
ing  to  various  indicators  and  various  pressures;  hence, 
taxation  is  endogenous  in  the  data-analysis  phase  of 
the  study.  Taxation  becomes  exogenous  when  we  wish 
to  predict  the  impact  of  a  specific  decision  to  raise  or 
lower  taxes.  The  reduced-form  method  is  valid  only 
when  past  decisions  are  nonresponsive  to  other  vari¬ 
ables  in  the  system,  and  this,  unfortunately,  elimi¬ 
nates  most  of  the  interesting  control  variables  (e.g., 
tax  rates,  interest  rates,  quotas)  from  the  analysis.3 

2This  distinction  is  often  blurred  in  the  literature. 
[Druzdzel  and  Simon,  1993],  for  example,  state:  “A  vari¬ 
able  is  considered  exogenous  to  a  system  if  its  value  is  de¬ 
termined  outside  the  system,  either  because  we  can  control 
its  value  externally  (e.g.,  the  amount  of  taxes  in  a  macro- 
economic  model)  or  because  we  believe  that  this  variable  is 
controlled  externally  (like  the  weather  in  a  system  describ¬ 
ing  crop  yields,  market  prices,  etc.)”  Still,  our  ability  to 
externally  control  the  value  of  a  variable  X  does  not  render 
X  exogenous  for  the  purpose  of  legitimizing  the  reduced 
form  analysis:  for  E[Y\X  =  x ]  to  represent  the  impact  of 
X  =  x  on  y,  X  must  also  be  independent  of  all  implicit 
factors  (disturbance  terms)  affecting  Y . 

While  every  economist  knows  that  this  disturbance- 
independence  is  a  necessary  condition  for  consistent  es¬ 
timation  of  structural  parameters,  most  economists  as¬ 
sume  that  disturbance-independence  is  a  guaranteed  prop¬ 
erty  of  controllable  policy  variables.  A  popular  textbook 
[Intriligator,  1978],  for  example,  mentions  these  two  prop¬ 
erties  as  if  they  were  synonymous:  “The  exogenous  vari¬ 
ables  are  variables  the  values  for  which  are  determined  out¬ 
side  the  model  but  which  influence  the  model.  From  a 
formal  standpoint  the  exogenous  variables  are  assumed  to 
be  statistically  independent  of  all  stochastic  disturbance 
terms  of  the  model,  while  the  endogenous  variables  are 
not  statistically  independent  of  those  terms.  ...  In  general 
the  exogenous  variables  are  either  historically  given,  policy 
variables,  or  determined  by  some  separate  mechanism.” 

3 This  problem  is  unrelated  to  the  celebrated  Lucas’s 
critique  [Lucas,  1976]  which  concerns  parameter  changes 


This  difficulty  is  not  unique  to  economic  or  social  pol¬ 
icy  making;  it  appears  whenever  one  wishes  to  evaluate 
the  merit  of  a  plan  on  the  basis  of  the  past  performance 
of  other  agents.  Even  when  the  signals  triggering  the 
past  actions  of  those  agents  are  known  with  certainty, 
a  systematic  method  must  be  devised  for  selectively 
ignoring  the  influence  of  those  signals  from  the  evalu¬ 
ation  process.  In  fact,  the  very  essence  of  evaluation  is 
having  the  freedom  to  imagine  and  compare  trajecto¬ 
ries  in  various  counterf actual  worlds,  where  each  world 
or  trajectory  is  created  by  a  hypothetical  implementa¬ 
tion  of  a  policy  that  is  free  of  the  very  pressures  that 
compelled  the  implementation  of  such  policies  in  the 
past. 

A  connection  between  counterfactuals  and  policy  mak¬ 
ing  was  formulated  in  [Balke  and  Pearl,  1994b]  using  a 
simple  device  from  action  theory.  In  that  formulation, 
the  counterfactual  antecedent  is  interpreted  as  a  hypo¬ 
thetical  minimal  intervention  that  forces  the  counter- 
factual  antecedent  to  hold  true.  If  a  system  is  modeled 
with  structural  equations  (respectively,  causal  graphs), 
an  intervention  is  simulated  by  severing  all  equations 
(causal  edges)  that  correspond  to  (lead  into)  the  an¬ 
tecedent  variables  and  setting  their  values  to  those 
specified  in  the  antecedent  [Strotz  and  Wold,  1971].  A 
calculus  for  working  with  interventions  in  causal  the¬ 
ories  is  given  in  [Pearl,  1994]. 

[Balke  and  Pearl,  1994b]  provides  background  and 
motivation  for  the  evaluation  of  counterfactual  con¬ 
ditionals  and  briefly  illustrates  how  the  intervention 
scheme  would  handle  counterfactuals  in  models  rep¬ 
resented  by  linear  structural  equations.  This  paper 
amends  and  expands  the  treatment  of  counterfactuals 
and  policy  making  in  any  structural  model  for  which 
the  form  of  the  equations  is  give.  It  also  presents  an 
example  of  their  use  in  the  area  of  econometrics,  where 
apparently  no  adequate  formalism  for  dealing  with  pol¬ 
icy  analysis  has  been  proposed.  In  contrast  to  reduced- 
form  analysis,  our  method  allows  evaluation  of  the  con¬ 
sequences  of  intervening  on  economic  attributes  that 
are  endogenous  in  normal  operation  only  to  become 
exogenous  for  the  purpose  of  evaluation.  For  example, 
after  developing  the  general  techniques  in  Section  3, 
we  will  illustrate  their  use  in  Section  4  by  evaluating 
the  effect  on  the  demand  for  some  commodity  when  a 
government  imposes  price  controls  on  that  commodity 
for  the  first  time. 

2  REVIEW  OF 

COUNTERFACTUAL  ANALYSIS 

In  this  section,  the  procedure  for  evaluating  counter- 
factual  conditionals  in  the  context  of  structural  equa- 


due  to  economic  agents  becoming  aware  of  interventions. 
The  failure  of  reduced-form  analysis  extends  to  physical 
systems  as  well,  where  there  are  no  rational  agents  to  speak 
of,  and  where  system  parameters  remain  unaltered  (except 
those  under  direct  control). 


tion  models  will  be  reviewed.  We  will  then  demon¬ 
strate  this  procedure  on  an  example  where  the  rela¬ 
tionships  among  observed  variables  are  deterministic, 
followed  by  an  example  that  demonstrates  how  excep¬ 
tions  and  disturbances  incorporated  into  the  model  af¬ 
fect  the  analysis  procedure. 

Let  V  =  {Vi,  V2, . . . ,  Vn)  represents  the  set  of  vari¬ 
ables  for  which  data  may  be  observed  in  a  system. 
U  —  [U\ ,  U2,  • .  • ,  Um}  will  represent  the  disturbances, 
exceptions,  and/or  abnormalities  influencing  the  ob¬ 
servable  variables  V .  For  example,  U  could  summa¬ 
rize  the  influence  of  many  exogenous  factors,  such  as 
the  “price  of  tea  in  China”  or  “the  local  weather.”  In 
general,  each  observable  variable  V{  is  a  deterministic 
function  of  the  form: 

Vi  =  fi(Vi,V2,...,Vn,UuU2,...,Um)  (1) 

The  structure  of  the  model  defined  by  these  equations 
may  be  depicted  by  a  causal  graph,  where  each  vari¬ 
able  on  the  left  hand  side  of  a  structural  equation  is 
the  child  of  those  variables  on  the  right  hand  side  of 
the  equation.  A  probability  distribution  over  the  dis¬ 
turbances,  P(uu  i/2,  •  •  • ,  tim),  embodies  the  nondeter¬ 
minism  in  the  model.  In  general,  this  distribution  is 
unconstrained;  however,  some  classes  of  models,  e.g., 
regression  models,  will  assume  that  the  disturbance 
variables  Uk  are  mutually  independent. 

A  counterfactual  conditional  will  be  written 

a  — ►  c  |  o  (2) 

and  read  as  “Given  that  we  have  observed  o,  if  a  were 
true,  then  c  would  have  been  true.”  The  observations 
o  consists  of  a  set  of  value  assignments  to  variables 
in  V ,  e.g.,  Vj  =  Vj ,  14  =  v*.  The  counterfactual  an¬ 
tecedent  a,  consists  of  a  conjunction  of  value  assign¬ 
ments  to  variables  in  V  that  are  forced  to  hold  true  by 
external  intervention.  Typically,  to  justify  being  called 
“counterfactual”,  a  conflicts  with  o.  Finally,  the  coun¬ 
terfactual  consequent,  c,  stands  for  the  proposition  of 
interest,  usually  the  values  attained  by  some  variables 
in  the  system. 

The  truth  of  a  counterfactual  conditional  a  — ►  c  |  0 
may  then  be  evaluated  by  the  following  procedure: 

•  Use  the  observations  o  to  update  the  joint  belief4 
for  all  root  nodes  in  the  causal  network.  This  joint 
belief  summarizes  the  state  of  the  system,  because 
each  non-root  variable  is  a  deterministic  function 
of  its  causal  influences. 

♦  Replace  the  structural  equation  for  each  variable 
14  referred  to  in  the  counterfactual  antecedent 
a  with  the  equation  14  =  a>vk  where  aVk  is  the 
value  of  Vi  specified  in  a.  This  implements  the 
local  intervention  that  forces  the  counterfactual 
antecedent  to  hold  true. 


Bob’s  firing 


Figure  1:  Causal  structure  reflecting  the  influence 
that  the  Captain’s  signal  has  on  Bob’s  firing  and  the 
Traitor’s  health,  and  the  direct  influence  that  Bob’s  fir¬ 
ing  has  on  the  Traitor’s  health . 


•  Compute  the  solutions  or  belief  of  the  consequent 
proposition  c  according  to  the  modified  set  of 
structural  equations. 

This  procedure  will  work  whenever  we  have  the  func¬ 
tional  form  of  the  /;’s,  in  which  case  the  model  is  called 
parametric ;  otherwise,  the  model  is  called  nonparamet - 
ric.  In  particular,  this  paper  concentrates  on  linear 
and  boolean  functions  (e.g.,  Noisy-OR  gates).  In  the 
case  that  the  model  is  nonparametric,  only  bounds 
may  be  calculated  for  the  belief  of  a  counterfactual 
consequent  [Balke  and  Pearl,  1994a]. 

To  illustrate  the  intervention-based  interpretation  of 
count  erf  actuals,  consider  a  firing  squad  with  several 
riflemen  (one  called  Bob)  and  a  Captain  who  gives  a 
signal  to  either  shoot  or  release  a  prisoner  charged  with 
treason.  The  behavior  of  these  agents  is  as  follows: 

•  The  Captain  waits  for  the  court  decision. 

•  Bob  typically  fires  his  rifle  if  and  only  if  the  Cap¬ 
tain  gives  the  signal  to  shoot. 

•  The  Traitor  typically  dies  if  and  only  if  the  Cap¬ 
tain  gives  the  signal  to  shoot  or  Bob  fires  his  rifle. 

Note  that  if  the  Captain  gives  the  signal  to  shoot  and 
Bob  does  not  fire,  the  traitor  will  typically  die  as  a 
result  of  the  other  riflemen  shooting,  but  these  inter¬ 
mediate  causes  will  not  be  made  explicit  in  this  story 
in  order  to  keep  the  model  simple. 

The  generic  causal  structure  that  reflects  this  descrip¬ 
tion  is  represented  in  Figure  1.  The  three  variables  C) 
B ,  and  T  have  the  following  domains: 

{0  =  Captain  gives  the  signal  to  release  ' 
the  traitor. 

1  =  Captain  gives  the  signal  to  shoot 
the  traitor. 

0  =  Bob  does  not  fire  his  rifle. 

1  =  Bob  fires  his  rifle. 

0  =  Traitor  lives.  1 
1  =  Traitor  dies.  J 


4  Here  we  use  the  generic  term  “belief”  to  refer  to  either 
truth  assignments  or  probabilities. 


The  following  subsections  will  demonstrate  the  evalu¬ 
ation  of  counterfactual  conditionals  under  two  varia- 


Bob’s  firing 


C  )  Captain’s  signal 


Traitor’s  health 

Figure  2:  Causal  structure  reflecting  an  external  inter¬ 
vention  that  forces  the  state  of  Bob’s  firing  despite  its 
normal  causal  influences ,  e.g.f  the  Captain’s  signal. 


tions  of  this  model.  The  first  assumes  that  the  behav¬ 
iors  of  the  characters  in  the  story  are  deterministic, 
while  the  second  admits  the  occurrence  of  exceptions. 

2.1  DETERMINISTIC  ANALYSIS 

A  deterministic  model  is  a  special  case  of  the  general 
structural  equation  model  where  the  disturbance  vari¬ 
ables  set  the  values  of  the  root  nodes  in  the  causal 
graph,  i.e., 

Vi  =  fi(Uk)  (3) 

and  the  remaining  observable  variables  (those  that  are 
not  root  nodes)  are  deterministic  functions  of  the  set 
of  observable  variables  V,  i.e., 

Vi  =  fi(VuV2,...,Vn)  (4) 

In  a  deterministic  model,  the  firing-squad  story  may  be 
concisely  expressed  by  the  following  structural  equa¬ 
tions 

B  =  C  (5) 

T  =  BMC  (6) 

Suppose  that  we  observe  Bob  fire  his  rifle  (6  =  1)  and 
the  traitor  expires  (t  =  1).  If  Bob  were  not  to  have 
fired  (6  =  0),  would  the  traitor  have  lived  ( t  =  0),  i.e., 
does  6=0  — ►  t-0  |  6=1,  t=l  hold  true?  Following  the 
procedure  previously  outlined,  the  belief  in  the  root 
nodes  of  the  causal  structure  are  first  evaluated;  in  this 
case,  did  the  captain  give  the  order  to  fire?  Applying 
Eq.  (5)  allows  us  to  abductively  infer  that  the  Captain 
must  have  given  the  order  to  fire  (c  =  1). 

The  structural  equations  (and  hence  the  causal  struc¬ 
ture)  are  then  modified  to  reflect  an  external  interven¬ 
tion  forcing  Bob  to  have  not  fired  (6  =  0): 

5  =  0  (7) 

T  =  BVC  (8) 

Figure  2  depicts  the  causal  structure  reflecting  this 
modified  set  of  structural  equations. 

Finally,  substituting  our  previously  computed  beliefs 
for  the  root  nodes  in  the  causal  structure,  i.e.,  that  the 
Captain  gave  the  order  to  fire  (c  =  1),  evaluate  our 
belief  in  the  traitor’s  state  of  health.  In  our  example 


query,  substitute  c  =  1  and  6  =  0  from  the  intervention 
into  Eq.  (8)  to  conclude  that  the  traitor  would  still 
have  died  (t  =  1).  Therefore,  the  analysis  leads  to 
the  statement,  “given  that  Bob  fired  his  rifle  and  the 
traitor  died,  if  Bob  had  not  fired  his  rifle,  the  traitor 
would  still  have  died.” 

This  method  for  analyzing  counterfactual  conditionals 
was  developed  with  the  goal  of  preventing  reasoning 
from  the  counterfactual  antecedent  variables  to  their 
ancestors  in  the  causal  structure,  e.g.,  to  conclude  that 
the  Captain  would  not  have  given  the  signal  to  shoot, 
if  Bob  did  not  fire.  Such  abductive  reasoning  is  legiti¬ 
mate  in  an  unchanged,  typical  world  but  does  not  re¬ 
flect  the  subjucntive  mood  of  the  counterfactual  which 
invites  unexpected  eventualities  (e.g.,  Bob  failing  to  or 
deciding  not  to  fire),  similar  to  eventualities  that  are 
considered  in  decision  making. 

This  solution  is  essentially  the  same  as  would  be  com¬ 
puted  by  [Simon  and  Rescher,  1966],  who  suppress  ab¬ 
ductive  inference  by  invoking  only  forward  inferences. 
Our  method,  which  suppresses  abduction  by  removing 
equations  from  the  model,  has  two  advantages.  In  the 
probabilistic  analysis,  our  method  permits  the  coun¬ 
terfactual  computation  using  ordinary  evidence  prop¬ 
agation  in  a  dual  network  [Balke  and  Pearl,  1994b]. 
Moreover,  our  proposal  is  also  applicable  to  nonrecur¬ 
sive  theories  as  will  be  shown  in  Section  3. 

2.2  ASSUMPTION-BASED  ANALYSIS 

In  the  previous  subsection  we  assumed  that  there  were 
no  exceptions  to  the  normal  behaviors  of  each  of  the 
characters  in  the  story.  A  more  realistic  model  of  the 
story  would  be  to  incorporate  assumptions  and  excep¬ 
tions  that  effect  how  each  observable  variable  is  ef¬ 
fected  by  its  observable  causal  influences.  For  exam¬ 
ple,  in  the  firing-squad  story,  there  may  be  exceptions 
to  Bob’s  firing  his  rifle  in  accordance  with  the  Cap¬ 
tain’s  signal:  his  rifle  may  become  jammed  preventing 
him  from  firing,  or  he  may  have  had  an  itchy  trigger 
finger.  In  addition,  the  traitor  may  have  a  cardiac 
arrest  and  die  without  anyone  firing,  or  all  the  rifle¬ 
men  may  miss  the  target.  In  order  to  accommodate 
these  eventualities  without  explicating  every  possible 
scenario,  we  will  write  the  structural  equations  with 
exception  terms: 

B  =  (CVab61)  A-iab62  (9) 

T  =  {B  V  C)  A  -iabti  V  abt2  (10) 

ab&i  summarizes  events  that  can  cause  Bob  to  fire  even 
though  the  Captain  did  not  give  the  order  to  fire,  while 
ab&2  summarizes  those  events  that  can  prevent  Bob 
from  firing  his  rifle.  Likewise,  abt2  summarizes  those 
events  that  can  cause  the  Traitor  to  die  even  though 
Bob  did  not  fire  and  the  Captain  did  not  give  the  or¬ 
der  to  fire,  while  ab*i  summarizes  those  events  that 
can  prevent  the  Traitor  from  expiring  even  though  the 
riflemen  fired.  These  abnormality  variables  correspond 
to  the  set  of  disturbance  variables  U  described  in  our 
definition  of  structural  equations  models. 


If  we  apply  the  previous  query  to  this  assumption- 
based  model,  the  same  conclusion  will  be  obtained, 
because  the  most  believable  world  consistent  with  the 
observations  contains  no  exceptions,  which  reduces 
Eqs.  (9)  and  (10)  to  Eqs.  (5)  and  (6).  Therefore,  we 
will  work  on  a  more  complex  query  where  abnormali¬ 
ties  make  a  difference  in  the  conclusion.  Suppose  that 
we  observe  the  Captain  give  the  signal  to  release  the 
traitor  (c  =  0)  and  the  Traitor  expires  (t  —  1).  Given 
this  data  there  is  a  possibility  that  Bob's  firing  was  an 
accident.  Now  we  ask:  If  Bob  were  not  to  have  fired 
(6  =  0),  would  the  Traitor  have  lived  (t  =  0)5,  i.e., 
does  6  —  0  — =  =  1  hold  true?  As  before, 

we  compute  updated  beliefs  for  each  root  variable  in 
the  model  given  the  observations.  Our  belief  in  C  is 
already  given  by  the  observation,  so  we  only  need  to 
compute  our  belief  in  the  abnormality  variables,  e.g., 
ab&i. 

Qualitatively,  those  states  of  the  world  that  minimize 
the  number  of  abnormalities  (exceptions)  are  to  be  as¬ 
signed  the  highest  belief.  The  fact  that  the  Captain 
gave  the  release  signal  and  the  Traitor  expired,  tells  us 
that  there  is  at  least  one  abnormal  condition.  Indeed, 
there  are  exactly  two  assignments  to  the  root  variables 
that  satisfy  the  observations  and  only  contain  one  ab¬ 
normal  condition: 

(C  =  0,  ab6i  =  1,  ab*2  =  0,  abn  =  0,  ab*2  =  0)  (11) 

(C  =  0,  abfti  =  0,  ab62  =  0,  abn  =  0,  ab*2  =  1)  (12) 

The  effect  of  the  external  intervention  that  forces  Bob 
not  to  fire  his  rifle  is  to  be  computed  under  these  two 
states  of  the  system.  First  the  structural  equations  are 
modified  to  reflect  the  external  intervention: 

B  =  0  (13) 

T  =  (B  V  C)  A  -.abn  V  ab*2  (14) 

Substituting  the  values  from  Eq.  (11)  into  these  equa¬ 
tions  leads  to  the  belief  that  the  Traitor  would  be 
alive.  Intuitively,  this  particular  state  corresponds  to 
the  case  where  Bob  had  an  itchy  trigger  finger  and 
hence  killed  the  Traitor;  if  Bob  were  prevented  from  fir¬ 
ing,  the  mechanism  responsible  for  the  Traitor's  death 
is  disabled  and  the  Traitor  would  have  lived. 

However,  substituting  the  values  from  Eq.  (12)  into  the 
revised  structural  equations  leads  to  the  alternative 
conclusion  that  the  Traitor  would  still  have  died.  In 
this  state,  the  Traitor  died  from  fright,  and  would  have 
expired  even  if  Bob  were  prevented  from  firing. 

If  the  exceptions  represented  by  ab&i  are  more  likely 
than  the  exceptions  represented  by  ab*2,  then  we 

5 This  counterfactual  conditional  differs  from  most  in 
that  no  direct  observation  has  been  made  for  the  variable 
referred  to  in  the  counterfactual  antecedent;  hence,  tech¬ 
nically,  the  conditional  may  or  may  not  be  “counterfac¬ 
tual.”  The  interpretation  of  a  local-intervention  on  the 
antecedent  variable,  though,  is  still  clear,  and  the  analysis 
procedure  can  compute  a  meaningful  belief  for  the  coun¬ 
terfactual  consequent. 


would  choose  to  believe  that  the  Traitor  would  have 
lived.  Otherwise,  we  would  conclude  that  the  Traitor 
would  still  have  expired. 

3  LINEAR-NORMAL  MODELS 

The  remainder  of  the  paper  will  concentrate  on  mod¬ 
els  where  the  functions  of  Eq.  (1)  are  linear  and  the 
disturbances  are  normally  distributed.  Some  notation 
will  be  helpful  for  expressing  background  knowledge 
and  counterfactual  queries  in  this  class  of  models.  Up¬ 
per  case  letters  (e.g.,  Q )  represent  variables  and  the 
corresponding  lower  case  letters  (e.g.,  q)  represent  the 
value  of  those  variables.  When  referring  to  a  set  of 
variables  or  values,  we  will  use  vector  notation  (e.g.,  X 
and  £);  however,  the  arrow  will  be  dropped  whenever 
the  variable  is  used  as  a  subscript  and  its  context  is 
known.  The  distribution  of  variables  in  a  linear  struc¬ 
tural  equation  model  with  Gaussian  disturbances  is 
fully  specified  by  a  mean  vector  (px)  and  a  covariance 
matrix  (E^). 

Counterfactual  distributions  will  be  notated  by  /ic*| a*,o 
and  £c.)C*|a*>c,  which  may  be  read  as  the  “mean  and 
covariance  of  c  given  the  observations  o,  if  a  were  true 
(counterfactually) .'' 

Assume  that  knowledge  is  specified  by  the  linear 
structural  equation  model  (often  used  in  economet¬ 
rics  and  the  social  sciences,  and  originally  established 
by  Sewall  Wright  in  his  development  of  path  analysis 
[Wright,  1921]) 

X  =  Bx  +  e 

where  B  is  a  matrix  (not  necessarily  triangular)  corre¬ 
sponding  to  a  causal  model  (possibly  cyclic),  and  we 
are  given  the  mean  pt  and  covariance  Ee>c  of  the  dis¬ 
turbances  ?  (assumed  to  be  normal).  The  variables  on 
the  right-hand  side  of  a  structural  equation  are  inter¬ 
preted  as  the  causal  influences  of  the  variable  on  the 
left-hand  side  of  the  equation.  The  mean  and  covari¬ 
ance  of  the  observable  variables  X  are  then  given  by 

px  =  Spe  (15) 

(16) 

where  S  =  (I  —  B)”1. 

Under  such  a  model,  there  are  well-known  formulas 
[Whittaker,  1990,  p.  163]  for  evaluating  the  mean  and 
covariance  of  X  conditioned  on  some  observations  o : 

Px\o  —  Mr  T  ”  Mo)  (1*0 

—  ^x,x  ~  ^XyO^OyO^OtX  (18) 

where,  for  every  pair  of  subvectors,  Z  and  W,  of  X, 
EZ}W  is  the  submatrix  of  EX)X  with  entries  correspond¬ 
ing  to  the  components  of  Z  and  W.  Singularities  of  E 
terms  are  handled  by  appropriate  means. 

Similar  formulas  apply  for  the  mean  and  covariance  of 
X  under  an  action  a.  For  mathematical  convenience, 


let  X  be  partitioned  according  to  whether  each  vari¬ 
able  is  referred  to  in  a.  The  set  of  variables  referred 
to  in  a  is  denoted  by  Z>  and  the  set  of  remaining  vari¬ 
ables  in  X  is  denoted  by  Y .  Under  this  partition,  the 
matrix  B  can  be  partitioned  into  four  submatrices: 


B 


Byy  Byz 
Bzy  Bzz 


B  is  replaced  by  the  action-pruned  matrix  B  =  [6,-j], 
defined  by 


bij 


Equivalently, 


0  if  X{  E  a 
b{j  otherwise 


(X*)  under  the  action  a  using  Eqs.  (19)  and  (20),  by 
replacing  the  prior  distribution  on  the  disturbances 
ECy)Cy  and  p€y  with  the  posterior  distribution  ^ 
and  a*  : 

r  ey 


l*T*\amo 


{l-Byy)-\ji°y  +  Byia,) 
az 


(21) 


E 


(l-Byy)-^y  .  ((I-Byy)-')' 


(22) 


It  is  clear  that  this  procedure  can  be  applied  to  non- 
triangular  matrices,  as  long  as  S  is  nonsingular. 


B 


Byy  Byz 
0  0 


4  EXAMPLE 


According  to  intervention  semantics  [Pearl,  1994],  all 
links  from  ez  to  Z  are  severed  and  Z  is  forced  to  the 
value  a.  Therefore,  the  modified  structural  equation 
model  for  X  when  influenced  by  external  actions  is 
given  by 


x 


(I-B)-1 


4 

0 


+  (/-5)-1 


0 

a 


Given  the  mean  and  covariance  of  the  mean  and 
covariance  of  the  observable  variables  X  may  be  eval¬ 
uated: 


ftx\a  — 


My|a 

ftz\a 

(I  —  Byy)  4-  Byzaz) 


(19) 


^x,x\a  —  YtyZyZ\a 

_  ^y,y \«  ^y,z\d 
^z,y\d  ^z,z\d 

(20) 


(I  -  Byyr'x€y,€y((i  -  Byy)-'y  0 
0  0 


To  evaluate  the  counterfactual  distribution  px*\&*o  and 
Y>x+)X*\a,  we  first  update  the  prior  distribution  of  the 
disturbances  by  their  distribution  conditioned  on  the 
observations  o : 


A  ^ 

Me  =  Me|  o 


Me  +  Mo) 

Me  4“  ^€teS0(iSo^€icS0)  (o  —  p0) 
^e.e  —  ^e,oS0jE0)6 

E£)e  -  E€ie55(5oEeieS5)"15<,E€|e 


where  S0  is  the  submatrix  of  S  containing  all  columns 
of  S  but  only  those  rows  corresponding  to  the  observed 
variables  in  o. 


We  then  evaluate  the  means  ftx*\amo  and  variances 
E x*,x*\a*o  of  the  variables  in  the  counterfactual  world 


Consider  the  econometric  structural  equation  model 
described  in  [Goldberger,  1992]: 

q  —  bip-\-dii  +  ui  (23) 

p  =  b2q  +  d2w  +  u2  (24) 

where  q  is  the  quantity  of  household  demand  for  prod¬ 
uct  A,  p  is  the  unit  price  of  product  A,  i  is  household 
income,  w  is  wage  rate  for  producing  product  A,  u\  is 
demand  shock,  and  u2  is  supply  shock. 

We  extend  this  model  by  incorporating  an  additional 
variable  r,  the  household  demand  for  some  substitute 
product  B,  along  with  its  structural  equation 

r  =  b3p  +  u3 

Let  B  stand  for  tea  and  A  for  coffee.  Consider  the 
following  set  of  counterfactual  queries: 

1.  Find  the  expected  demand  for  coffee  ( q )  had  coffee 
prices  (p)  been  controlled,  say  at  p  ~  $7.00? 

2.  Find  the  expected  demand  for  coffee  (q)  had  coffee 
prices  (p)  been  controlled,  say  at  p  =  $7.00,  as¬ 
suming  the  demand  for  tea  subsequently  reaches 
r  =  4? 

3.  Given  that  the  current  demand  for  tea  (r)  is  r  —  4, 
find  the  expected  demand  for  coffee  (g)  had  coffee 
prices  (p)  been  controlled,  say  at  p  =  7.00? 

Note  the  difference  between  queries  2  and  3.  Query 

2  states  that  the  price  intervention  occurs  prior  to 
our  observation  of  product  B’s  demand,  while  query 

3  states  that  we  first  make  an  observation  of  product 
B’s  demand  and  then  intervene  to  force  product  A’s 
price. 

The  above  counterfactual  queries  only  involve  the  vari¬ 
ables  X  =  [P,  Q ,  P];  therefore,  we  may  marginalize  out 
all  remaining  variables  in  Eqs.  (23)  and  (24),  only  re¬ 
taining  the  distributions  on  P,  Q,  and  P’s  disturbance 
terms.  Because  I  and  W  are  exogenous  (root)  vari¬ 
ables  in  the  structural  equations,  we  may  combine  I 


and  Ui  into  one  disturbance  variable  eq.  Likewise,  W 
and  U2  may  be  combined  into  one  disturbance  variable 
ep.  The  structural  equations  for  analyzing  the  above 
counterfactual  queries  may  be  reduced  to 


X  =  Bx  +  € 


‘  p  ' 

o 

o 

‘  P  ‘ 

'  €p  ' 

q 

'  = 

6i  0  0 

q 

+ 

V 

.63  0  0  . 

r 

.  . 

The  causal  structure  for  this  model  is  shown  in  Fig- 
ure  3. 


Figure  3:  Causal  structure  of  an  econometric  model 
relating  the  demand  for  two  products  A  and  B  and  the 
price  of  product  A.  The  variables  are  related  according 
to  the  linear  structural  equations  given  in  Eq.  (25), 
where  the  disturbances  ep,  eq,  and  er  are  independent 
and  normally  distributed. 


Because  R  and  Q  are  d-separated  ([Pearl,  1988])  by 
P  when  the  arrow  Q  — *•  P  is  removed,  the  observa¬ 
tion  of  R  after  P’s  intervention  has  no  impact  on  the 
evaluation  of  Q’s  distribution.  Therefore,  the  counter- 
factual  distribution  of  demand  for  coffee  ( Q )  will  be 
the  same  for  queries  1  and  2. 


Suppose  that  the  parameters  for  this  model  are  given 

by 


B 


fit 


E 


r  0  0.50  0  ' 
-1.80  0  0 

1.00  0  0  . 

[  0  19.00  3.00  ] 

r  1.00  0  0  ■ 

0  3.00  0 

0  0  2.00  . 


which  reflects  the  following  prior  distribution  on  X  - 


[P,Q,R]- 

fir 


E 


X,X 


[  5.00  10.00  8.00  ] 

f  0.48  -0.08  0.48  ' 

-0.08  1.73  -0.08 

0.48  -0.08  2.48  . 


The  expected  price  of  coffee  is  $5.00,  while  the  average 
demand  for  coffee  and  tea  are  10  units  and  8  units, 
respectively. 

Query  1  is  interested  in  determining  the  distribution 
of  demand  for  coffee  (Q),  given  that  no  observations 


have  been  made  on  the  system,  if  we  had  intervened 
to  force  the  price  of  coffee  to  $7.00.  Evaluating  the 
expressions  in  Eqs.  (21)  and  (22),  we  obtain: 

fix*\p=7  =  [  7-°°  6'40  10'00  1  (26) 

r  0  0  0 1 


We  conclude  that  the  average  household  demand  for 
coffee  and  tea  would  be  6.4  units  and  10  units,  respec¬ 
tively,  if  the  price  of  coffee  were  $7.00. 


Query  3  asks  for  the  expected  demand  demand  for 
coffee  (Q)  had  the  price  of  coffee  been  controlled  at 
$7.00,  given  that  demand  for  tea  is  currently  4  units. 
Applying  the  expressions  in  Eqs.  (21)  and  (22): 

5.13  6.78  ]  (27) 

0  0  1 
2.75  -0.64 
9  -0.64  0.39 


fix‘\p=7,r—i  =  [  7'00 

0 

7r*|p=7,r=4  =  1  ® 


Note  the  importance  of  the  observation  of  demand  for 
tea  (R).  In  query  1,  we  found  that  forcing  the  price  of 
coffee  (P)  to  $7.00  would  reduce  the  expected  demand 
for  coffee  (Q)  from  10  units  to  6.4  units.  The  observa¬ 
tion  of  a  4  unit  demand  for  tea  changes  the  expected 
demand  for  coffee  to  /r?|r=4  =  10.13  units;  if  we  inter¬ 
vene  to  force  the  price  of  coffee  to  $7.00,  the  expected 
demand  for  coffee  (Q)  will  be  reduced  from  10.13  to 
5.13  units.  Therefore,  we  see  that  enforcing  a  $7.00 
price  control  on  coffee  would  have  a  more  adverse  af¬ 
fect  on  the  demand  for  coffee  under  the  knowledge  that 
the  demand  for  tea  was  only  4  units.  In  addition,  the 
expected  demand  for  tea  would  increase  to  6.78  units 
from  the  observed  4  units. 


If  we  believe  that  the  disturbance  on  the  demand  for 
coffee  (eq)  changes  slowly,  or  at  least  changes  infre¬ 
quently,  then  we  can  use  the  results  of  this  counterfac¬ 
tual  distribution  to  determine  whether  price  controls 
should  now  be  imposed  to  meet  our  needs.  In  other 
words,  the  counterfactual  distribution  will  tell  us  how 
we  expect  variables’  distributions  to  change  as  a  result 
of  an  external  intervention  applied  in  the  present. 


It  is  important  to  note  the  difference  between  counter- 
factual  distributions  (conditioned  on  observations  and 
external  intervention)  and  distributions  simply  condi¬ 
tioned  on  observations.  Consider  the  distribution  that 
would  be  computed  from  observing  the  price  of  coffee 
at  $7.00  ( p  =  7)  or  from  observing  the  demand  for  tea 
at  4  units  and  the  coffee  price  at  $7.00  (r  =  4,  p  —  7). 


fix 


|p=7 


0’x)r|p=7 

fix\r=4.p=7 


<7x,x|r=4,p=7 


[  7.00  9.66  10.00  ] 

0  0  0  ■ 

0  1.71  0 

_  o  0  2.00  . 

[  7.00  9.66  4.00  ] 

0  oo- 

0  1.71  0 

0  0  0 


(28) 

(29) 

(30) 

(31) 


Contrast  the  expected  demand  for  coffee  evaluated 
from  these  conditional  distributions  with  that  ex- 
pected  had  the  price  of  coffee  been  fixed  by  exter¬ 
nal  intervention.  In  particular,  compare  Eq.  (28)  to 
Eq.  (26)  and  Eq.  (30)  to  Eq.  (27).  One  reason  it  is 
incorrect  to  use  distributions  conditioned  on  observa¬ 
tions  for  evaluating  (economic)  policies,  is  that  such 
distributions  convey  false  information  about  the  post¬ 
intervention  state  of  the  disturbances.  Accounting  for 
the  pre-intervention  value  of  the  controlled  variables, 
which  convey  correct  information  about  those  distur¬ 
bances,  is  important  therefore  for  properly  evaluating 
the  effect  of  the  intervention. 

5  CONCLUSION 

This  paper  has  addressed  the  inadequacy  of  current 
techniques  in  econometrics  and  the  social  sciences  for 
evaluating  the  potential  effects  of  economic  and  social 
policies.  Current  techniques  fail  to  correctly  evalu¬ 
ate  policies  that  control  endogenous  variables,  that  is, 
variables  that  are  influenced  by  other  variables  in  the 
system  prior  to  enacting  the  policy. 

We  have  addressed  this  deficiency  by  developing  and 
applying  a  formalism  for  evaluating  counterfactual 
conditionals  in  structural  equation  models.  This 
method  is  applicable  to  the  analysis  of  policies,  even 
when  the  policy  dictates  intervention  on  an  endoge¬ 
nous  variable.  An  example  was  presented  that  demon¬ 
strates  the  disparity  between  analyses  based  on  coun- 
terfactuals  and  reduced-form  analysis  which  treats  in¬ 
tervention  as  an  observation  on  controlled  variables. 

The  technique  developed  in  this  paper  should  also  be 
applicable  to  AI  problems  in  situations  where  a  strat¬ 
egy  is  to  be  evaluated  on  the  basis  of  structural  equa¬ 
tions  with  a  given  functional  form.  Examples  are  pre¬ 
sented  for  causal  models  using  boolean  functions,  with 
and  without  exceptions. 
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